Integrate Automated QDQ autotuner - part 3.2 by willg-nv · Pull Request #838 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-02T02:58:02Z

What does this PR do?

This PR implements QDQAutotuner class. This class is used to drive the main Autotuner workflow.

The workflow is:

uses RegionSearch to build regions
generate QDQ ONNX models and evaluate perf
save best model

This PR is part 2/4 of #703.

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

Overview: ?

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Not in this part.
Did you add or update any necessary documentation?: No, document will be updated in part 4.
Did you update Changelog?: No, change log will be updated when all changes are ready.

Additional Information

Summary by CodeRabbit

New Features
- Introduced ONNX Q/DQ autotuning framework with automatic region discovery and pattern-based optimization.
- Added model profiling and quantization scheme generation capabilities.
- Enabled state persistence and quantization model export functionality.
- Introduced configuration management for quantization parameters and profiling workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-02-02T02:58:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-02T02:58:26Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces a new ONNX quantization autotuning module that enables automatic Q/DQ (Quantize/Dequantize) node insertion and optimization using pattern-based region analysis. Provides a comprehensive framework for discovering optimal insertion points, profiling schemes, and exporting quantized models.

Changes

Cohort / File(s)	Summary
Module Initialization `modelopt/onnx/quantization/autotune/__init__.py`	Exposes public API surface: QDQAutotuner class, configuration/exception types (Config, InsertionScheme, PatternSchemes, RegionType), insertion point abstractions, and utility classes (PatternCache, Region, RegionPattern, CombinedRegionSearch).
Core Autotuner Implementation `modelopt/onnx/quantization/autotune/autotuner.py`	Implements QDQAutotunerBase and QDQAutotuner with region discovery, pattern-based Q/DQ insertion logic, profiling workflow, state management, graph mutation, insertion point resolution, and ONNX export capabilities. Supports scheme generation, convergence tracking, FP8 conversion, and pattern cache integration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Autotuner as QDQAutotuner
    participant RegionSearch as CombinedRegionSearch
    participant Profiler as Profiling System
    participant Inserter as Q/DQ Insertion
    participant Exporter as ONNX Exporter

    User->>Autotuner: initialize(config, pattern_cache)
    Autotuner->>Autotuner: Load model & init state

    User->>RegionSearch: discover regions
    RegionSearch-->>Autotuner: return regions

    loop For each region
        User->>Autotuner: set_profile_region(region)
        Autotuner->>Autotuner: Commit profiling outcomes
        Autotuner->>Profiler: Prepare region-pattern pairs
        
        loop Generate candidates
            User->>Autotuner: generate()
            Autotuner->>Inserter: Build insertion scheme
            Inserter->>Inserter: Insert Q/DQ nodes
            User->>Autotuner: submit(latency_ms)
            Autotuner->>Autotuner: Track performance metrics
        end
    end

    User->>Autotuner: export_onnx(best=True)
    Autotuner->>Inserter: Apply best scheme
    Inserter->>Exporter: Finalize Q/DQ graph
    Exporter-->>User: return quantized ONNX bytes

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Integrate Automated QDQ autotuner - part 3.2' accurately describes the PR's main objective: integrating the QDQAutotuner class implementation into the codebase as part 3.2 of a larger feature series.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 1024-1029: The try/except around graph.cleanup().toposort()
swallows all exceptions (except Exception as e) and merely logs a warning, which
can hide serious graph corruption; update the handler in autotuner.py to either
catch only expected exception types (e.g., specific cleanup/toposort exceptions)
or log the error and re-raise it so execution stops on unexpected failures —
locate the graph.cleanup().toposort() call and replace the broad except with
either a narrowed except for known recoverable exceptions or add a raise after
logger.warning/failure log to propagate the error.
- Line 622: Remove the redundant local import "from datetime import datetime"
(the one added at line with the single import statement) in autotuner.py; the
module already imports datetime at the top of the file, so delete this local
import to avoid duplication and potential shadowing (look for the statement
"from datetime import datetime" inside the function or block and remove it).
- Around line 912-918: The zero-point arrays q_zp_values (and the corresponding
dq_zp_values) are created with a hardcoded dtype np.int8 which can mismatch the
QuantizeLinear/DequantizeLinear output type when quant_type is "uint8" or other
types; update their construction to use the same dtype as the computed
quant_dtype instead of np.int8 so q_zp_values and dq_zp_values match the
quantized output element type used when building q_inputs and dq_inputs (refer
to q_scale_values, q_zp_values, q_inputs and the corresponding dq_* variables to
locate where to change the dtype).
- Around line 1013-1021: The import of get_tensor_consumer_node_indices is wrong
and causes an import error; replace that import with get_tensor_consumer_nodes
and update any usage names accordingly (the code that uses tensor_users_map
already expects a defaultdict(list) so no KeyError handling is needed).
Specifically, change the symbol imported from
modelopt.onnx.quantization.graph_utils from get_tensor_consumer_node_indices to
get_tensor_consumer_nodes and ensure tensor_users_map is assigned from
get_tensor_consumer_nodes(...) where used in the autotuner (references:
get_tensor_consumer_node_indices, get_tensor_consumer_nodes, tensor_users_map).

🧹 Nitpick comments (4)

modelopt/onnx/quantization/autotune/autotuner.py (4)

229-229: Consider defining config attributes explicitly.

Using getattr(self.config, "maximum_generation_attempts", 100) with defaults (also seen at lines 718-719 and 744) suggests these attributes may not be formally defined on the Config class. This pattern makes it harder to discover available configuration options.

💡 Suggestion

Consider adding these attributes to the Config class with documented defaults rather than relying on getattr fallbacks:

# In Config class
maximum_generation_attempts: int = 100
top_percent_to_mutate: float = 0.1
minimum_schemes_to_mutate: int = 1
maximum_mutations: int = 3

333-335: Replace assertions with explicit checks for runtime validation.

Assertions on lines 333-335 (and similarly at line 314) are used for validating runtime conditions. Since assertions can be disabled with python -O, these should be explicit checks for production code.

🛡️ Proposed fix

-                full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph)
-                assert full_insertion_scheme is not None
-                all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme)
-                assert isinstance(all_region_ips, set)
+                full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph)
+                if full_insertion_scheme is None:
+                    logger.warning(f"Failed to get full insertion scheme for region {region.id}")
+                    continue
+                all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme)
+                if not isinstance(all_region_ips, set):
+                    raise TypeError(f"Expected set from pattern.matches, got {type(all_region_ips)}")

972-985: Assertions used for critical runtime validation.

These assertions validate critical invariants (node index bounds, input index bounds, tensor name matching) but can be disabled with python -O. Consider using explicit checks with ValueError/IndexError for production safety.

🛡️ Proposed fix

             if node_index is not None:
-                assert node_index < len(graph.nodes), "Node index out of range"
+                if node_index >= len(graph.nodes):
+                    raise IndexError(f"Node index {node_index} out of range (max: {len(graph.nodes) - 1})")
                 target_node = graph.nodes[node_index]
-                assert input_index is not None, "Input index must be set when node index is set"
-                assert input_index < len(target_node.inputs), (
-                    f"Input index out of range for node {target_node.name}"
-                )
+                if input_index is None:
+                    raise ValueError("Input index must be set when node index is set")
+                if input_index >= len(target_node.inputs):
+                    raise IndexError(f"Input index {input_index} out of range for node {target_node.name}")
                 original_tensor = target_node.inputs[input_index]
-                assert tensor_name == original_tensor.name, (
-                    f"Tensor name mismatch for node {target_node.name} input {input_index}"
-                )
+                if tensor_name != original_tensor.name:
+                    raise ValueError(f"Tensor name mismatch: expected '{tensor_name}', got '{original_tensor.name}'")
             else:
-                assert tensor_name in tensor_map, f"Tensor {tensor_name} not found in tensor map"
-                assert input_index is None, "Input index must be None when node index is None"
+                if tensor_name not in tensor_map:
+                    raise KeyError(f"Tensor {tensor_name} not found in tensor map")
+                if input_index is not None:
+                    raise ValueError("Input index must be None when node index is None")

1042-1049: Consider iterative approach for deep region hierarchies.

_visit_region_recursively uses recursion which could hit Python's stack limit for very deep region hierarchies. While this is unlikely for typical ONNX models, an iterative approach would be more robust.

♻️ Iterative alternative

def _visit_region_recursively(self, region: Region) -> list[Region]:
    """Iteratively traverse region hierarchy and collect all regions."""
    regions = []
    stack = [region]
    while stack:
        current = stack.pop()
        regions.append(current)
        stack.extend(current.get_children())
    return regions

modelopt/onnx/quantization/autotune/autotuner.py

tests/unit/onnx/quantization/autotune/autotune/test_autotuner.py

ajrasane · 2026-02-12T09:33:41Z

/ok to test b02fef1

modelopt/onnx/quantization/autotune/autotuner.py

gcunhase · 2026-02-18T20:33:19Z

/ok to test 68ffa5a

gcunhase · 2026-02-19T20:01:58Z

Tests are failing because common.py is in #839. We may need to move it here to pass the CICD tests. @willg-nv

willg-nv · 2026-02-23T02:21:10Z

Tests are failing because common.py is in #839. We may need to move it here to pass the CICD tests. @willg-nv

I have fixed the test failures, please help rerun the job, thanks!

gcunhase · 2026-02-24T01:04:04Z

/ok to test 6124ac9

modelopt/onnx/quantization/autotune/autotuner.py

Signed-off-by: Will Guo <willg@nvidia.com>

modelopt-bot · 2026-02-25T18:21:34Z

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core QDQAutotuner class. The code is well-structured with good documentation and tests, but several critical and high-priority issues need fixing.

🚨 Critical Issues

1. Import Error:

File: (line 44)

⚠️ High Priority

2. Duplicate datetime import

File: (line 622)

Remove - already imported at module level.

3. Broad exception handling

File: (lines 1024-1029)

Narrow to specific exceptions or add after logging.

4. Zero-point dtype hardcoded to int8

File: (lines 912-918)

🔧 Medium Priority

5. Config attributes not explicitly defined

File: (lines 229, 718, 744)

Using - define these in class:

6. Assertions for runtime validation

File: (lines 314-316, 333-335, 972-985)

Replace assertions with explicit exceptions (asserts are disabled with ).

💡 Simplification Suggestions

File size: is 1,320 lines - consider splitting:

- Base class
- Main implementation
- Q/DQ insertion logic

Recursive traversal: (line 1042) - consider iterative approach for deep hierarchies.

✅ Recommendations

Before merge:

Fix import error (critical)
Remove duplicate datetime import
Fix zero-point dtype
Narrow exception handling

Nice to have:
5. Define Config attributes explicitly
6. Convert assertions to exceptions
7. Consider file refactoring

cc: @willg-nv @gcunhase

gcunhase · 2026-02-25T18:26:47Z

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core class that drives the QDQ autotuning workflow. The code is generally well-structured with comprehensive documentation and tests, but several issues need addressing before merge.

✅ What's Good

Well-documented - Extensive docstrings explaining the workflow

Comprehensive tests - Good coverage of initialization, region discovery, export

Clean architecture - Good separation between and

State persistence - YAML save/load for reproducibility is well-implemented

🚨 Critical Issues (Must Fix)

1. Import Error:

Location: line 44

This will cause an ImportError at runtime. Based on CodeRabbit's review, the function name should be .

⚠️ High Priority Issues

2. Duplicate import

Location: line 622

There's a local import inside a method, but is already imported at module level. Remove the duplicate.

3. Broad exception handling in graph cleanup

Location: lines 1024-1029

This swallows all exceptions. Either catch only expected exception types (e.g., specific cleanup/toposort exceptions) or log and re-raise.

4. Zero-point dtype mismatch

Location: lines 912-918

This doesn't respect the configured . Should use variable instead.

🔧 Medium Priority Issues

5. anti-pattern for config attributes

Location: Lines 229, 718, 744

Using suggests these may not be defined in . Define them explicitly with proper defaults:

6. Assertions used for runtime validation

Location: Lines 314-316, 333-335, 972-985

Multiple statements for runtime validation. These are disabled with . Convert to explicit exceptions:

💡 Simplification Suggestions

Component Verdict Notes
✅ Keep Good abstraction
✅ Keep Main implementation
✅ Keep Necessary for scheme management
✅ Keep Good for warm-starting
Config expansion ⚠️ Clean up Some fields added via
File size concern: is 1,320 lines. Consider splitting:

Base class and utilities

Main QDQAutotuner class

Q/DQ insertion and ONNX export logic

Recursive traversal concern: (line 1042) uses recursion which could hit stack limits for deep hierarchies. Consider iterative approach:

✅ Recommendations

Before merge:

✅ Fix import: →

✅ Remove duplicate import

✅ Fix zero-point dtype to respect

✅ Narrow exception handling in graph cleanup

Nice to have: 5. Define all Config attributes explicitly (remove fallbacks) 6. Convert assertions to explicit exceptions 7. Consider breaking into smaller files

Overall: The PR is well-implemented and necessary for the QDQ autotuning workflow. Fix the critical import error before merge, then address high-priority items.

cc: @willg-nv @gcunhase

@willg-nv please address the critical and high priority issues raised above before merging. Thanks.

modelopt-bot · 2026-02-25T18:33:41Z

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core QDQAutotuner class. The code is well-structured with good documentation and tests, but several critical and high-priority issues need fixing.

🚨 Critical Issues

1. Import Error: `get_tensor_consumer_node_indices`

File: modelopt/onnx/quantization/autotune/autotuner.py (line 44)

# Current (will cause ImportError):
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_node_indices

# Should be:
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_nodes

⚠️ High Priority

2. Duplicate datetime import

File: modelopt/onnx/quantization/autotune/autotuner.py (line 622)

Remove from datetime import datetime - already imported at module level.

3. Broad exception handling

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 1024-1029)

except Exception as e:  # Too broad - catches KeyboardInterrupt, SystemExit, etc.

Narrow to specific exceptions or add raise after logging.

4. Zero-point dtype hardcoded to int8

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 912-918)

q_zp_values = np.array([q_zero_point], dtype=np.int8)  # Hardcoded
# Should use: dtype=quant_dtype

🔧 Medium Priority

5. Config attributes not explicitly defined

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 229, 718, 744)

Using getattr(self.config, "attr", default) - define these in Config class:

maximum_generation_attempts
top_percent_to_mutate
minimum_schemes_to_mutate
maximum_mutations

6. Assertions for runtime validation

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 314-316, 333-335, 972-985)

Replace assertions with explicit exceptions (asserts are disabled with python -O).

💡 Simplification Suggestions

File size: autotuner.py is 1,320 lines - consider splitting:

autotuner_base.py - Base class
autotuner.py - Main implementation
export_utils.py - Q/DQ insertion logic

Recursive traversal: _visit_region_recursively (line 1042) - consider iterative approach for deep hierarchies.

✅ Recommendations

Before merge:

Fix import error (critical)
Remove duplicate datetime import
Fix zero-point dtype
Narrow exception handling

Nice to have:
5. Define Config attributes explicitly
6. Convert assertions to exceptions
7. Consider file refactoring

cc: @willg-nv @gcunhase

cjluo-nv · 2026-02-25T18:37:37Z

autotuner_base.py - Base class

autotuner.py - Main implementation

export_utils.py - Q/DQ insertion logic

@willg-nv could you fix 1,2,4 and split the autotuner.py as suggested?

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv · 2026-02-26T02:27:59Z

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core QDQAutotuner class. The code is well-structured with good documentation and tests, but several critical and high-priority issues need fixing.

🚨 Critical Issues

1. Import Error: get_tensor_consumer_node_indices

File: modelopt/onnx/quantization/autotune/autotuner.py (line 44)
# Current (will cause ImportError):
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_node_indices

# Should be:
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_nodes
⚠️ High Priority

2. Duplicate datetime import

File: modelopt/onnx/quantization/autotune/autotuner.py (line 622)

Remove from datetime import datetime - already imported at module level.

3. Broad exception handling

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 1024-1029)
except Exception as e:  # Too broad - catches KeyboardInterrupt, SystemExit, etc.
Narrow to specific exceptions or add raise after logging.

4. Zero-point dtype hardcoded to int8

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 912-918)
q_zp_values = np.array([q_zero_point], dtype=np.int8)  # Hardcoded
# Should use: dtype=quant_dtype
🔧 Medium Priority

5. Config attributes not explicitly defined

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 229, 718, 744)

Using getattr(self.config, "attr", default) - define these in Config class:

maximum_generation_attempts

top_percent_to_mutate

minimum_schemes_to_mutate

maximum_mutations

6. Assertions for runtime validation

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 314-316, 333-335, 972-985)

Replace assertions with explicit exceptions (asserts are disabled with python -O).

💡 Simplification Suggestions

File size: autotuner.py is 1,320 lines - consider splitting:

autotuner_base.py - Base class

autotuner.py - Main implementation

export_utils.py - Q/DQ insertion logic

Recursive traversal: _visit_region_recursively (line 1042) - consider iterative approach for deep hierarchies.

✅ Recommendations

Before merge:

Fix import error (critical)

Remove duplicate datetime import

Fix zero-point dtype

Narrow exception handling

Nice to have: 5. Define Config attributes explicitly 6. Convert assertions to exceptions 7. Consider file refactoring

cc: @willg-nv @gcunhase

1, I think this is a false alarm for get_tensor_consumer_node_indices is in current repo. So there should be no ImportError

2: this issue does not exists in latest code.
3. fixed
4. this is false alarm.
5. fixed.
6. simplification suggestions applied

willg-nv · 2026-02-26T02:29:03Z

autotuner_base.py - Base class

autotuner.py - Main implementation

export_utils.py - Q/DQ insertion logic

@willg-nv could you fix 1,2,4 and split the autotuner.py as suggested?

please check comment #838 (comment)

cjluo-nv · 2026-02-26T07:02:10Z

/ok to test c99edab

Signed-off-by: Will Guo <willg@nvidia.com>

modelopt-bot · 2026-02-26T07:38:32Z

Security Review: PRs #838 and #839

Summary

Both PRs have been reviewed against the security checklist. One security issue was identified in PR #839 with a # nosec comment bypassing Bandit security checks.

PR #838 - Integrate Automated QDQ autotuner - part 3.2

Security Check Results

Check	Status	Details
torch.load(..., weights_only=False)	✅ PASS	Not used in this PR
numpy.load(..., allow_pickle=True)	✅ PASS	Not used in this PR
trust_remote_code=True	✅ PASS	Not used in this PR
eval() / exec() on external input	✅ PASS	Not found
# nosec bypass	✅ PASS	Not found

Code Review

The PR uses yaml.safe_load() which is the correct secure method for loading YAML files. No pickle usage, torch loading with unsafe settings, or code execution vulnerabilities were found.

Status: No security issues identified.

PR #839 - Integrate Automated QDQ placement tool - part 3.3

🔴 CRITICAL Security Issue Found

Issue: Unauthorized use of `# nosec` comment

File: modelopt/onnx/quantization/autotune/__main__.py
Line: ~line 32 (in DEFAULT_TIMING_CACHE definition)
Code:

DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"  # nosec B108

Security Check Results

Check	Status	Details
torch.load(..., weights_only=False)	✅ PASS	Not used in this PR
numpy.load(..., allow_pickle=True)	✅ PASS	Not used in this PR
trust_remote_code=True	✅ PASS	Not used in this PR
eval() / exec() on external input	✅ PASS	Not found
# nosec bypass	🔴 FAIL	Found unauthorized `# nosec B108` comment

Explanation

The # nosec B108 comment is used to bypass Bandit's B108 security check (probable insecure temporary file creation). According to the security policy:

"If a security-sensitive pattern is genuinely necessary, the PR must be reviewed and approved by @NVIDIA/modelopt-setup-codeowners with an explicit justification in the PR description."

This PR does not include:

Approval from @NVIDIA/modelopt-setup-codeowners
An explicit justification in the PR description for why this bypass is safe

Recommended Fix

# Option 1: Remove the nosec comment and use a safer path
DEFAULT_TIMING_CACHE = os.path.join(tempfile.gettempdir(), "trtexec_timing.cache")

# Option 2: If B108 bypass is genuinely required, add justification:
# nosec B108: This path is used for TensorRT timing cache which is 
# created by trtexec and does not contain sensitive data.
# Reviewed and approved by: @NVIDIA/modelopt-setup-codeowners
DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"

Additional Security Notes

The PR uses yaml.safe_load() in PatternCache.load() and state loading, which is the secure method. No other security issues were identified.

Overall Recommendation

PR	Status	Required Action
#838	✅ Approved	No security issues; can merge
#839	🔴 Request Changes	Remove or justify `# nosec B108` comment with approval from @NVIDIA/modelopt-setup-codeowners

cc: @willg-nv @gcunhase @NVIDIA/modelopt-setup-codeowners

willg-nv · 2026-02-26T07:42:01Z

Security Review: PRs #838 and #839

Summary

Both PRs have been reviewed against the security checklist. One security issue was identified in PR #839 with a # nosec comment bypassing Bandit security checks.

PR #838 - Integrate Automated QDQ autotuner - part 3.2

Security Check Results

Check Status Details
torch.load(..., weights_only=False) ✅ PASS Not used in this PR
numpy.load(..., allow_pickle=True) ✅ PASS Not used in this PR
trust_remote_code=True ✅ PASS Not used in this PR
eval() / exec() on external input ✅ PASS Not found

nosec bypass ✅ PASS Not found

Code Review

The PR uses yaml.safe_load() which is the correct secure method for loading YAML files. No pickle usage, torch loading with unsafe settings, or code execution vulnerabilities were found.

Status: No security issues identified.

PR #839 - Integrate Automated QDQ placement tool - part 3.3

🔴 CRITICAL Security Issue Found

Issue: Unauthorized use of # nosec comment

File: modelopt/onnx/quantization/autotune/__main__.py Line: ~line 32 (in DEFAULT_TIMING_CACHE definition) Code:
DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"  # nosec B108

modelopt/onnx/quantization/autotune/main.py is not in this PR

codecov · 2026-02-26T16:13:58Z

Codecov Report

❌ Patch coverage is 73.15855% with 215 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.19%. Comparing base (d78797b) to head (c1f93b2).
⚠️ Report is 10 commits behind head on main.

Files with missing lines	Patch %	Lines
...elopt/onnx/quantization/autotune/autotuner_base.py	75.63%	105 Missing ⚠️
modelopt/onnx/quantization/autotune/common.py	60.45%	70 Missing ⚠️
...odelopt/onnx/quantization/autotune/export_utils.py	74.34%	39 Missing ⚠️
modelopt/onnx/quantization/autotune/autotuner.py	97.29%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #838      +/-   ##
==========================================
- Coverage   73.09%   72.19%   -0.90%     
==========================================
  Files         205      210       +5     
  Lines       22301    23488    +1187     
==========================================
+ Hits        16300    16957     +657     
- Misses       6001     6531     +530

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Will Guo <willg@nvidia.com>

cjluo-nv · 2026-02-26T18:08:14Z

/ok to test c1f93b2

willg-nv · 2026-02-27T01:47:25Z

/ok to test c1f93b2

## What does this PR do? This PR implements QDQ autotuner CLI. This is the initial version of CLI, it will be integrated to modelopt.onnx.quantization.autotune. Usage: ``` python -m modelopt.onnx.quantization.autotune --onnx_path model.onnx --schemes_per_region 50 --pattern_cache cache.yaml --qdq_baseline baseline.onnx --quant_type int8 --verbose ``` PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 **Overview:** ? ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: Document will be added in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: CHANGE log will be added in part 4. ## Additional Information   ## Summary by CodeRabbit ## Release Notes * **New Features** * Added a command-line interface for ONNX quantization autotuning with configurable parameters for models, output paths, quantization strategies, and TensorRT benchmarking. * Introduced an automated workflow for pattern-based region optimization with state management, baseline comparison, and benchmarking capabilities. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv requested a review from a team as a code owner February 2, 2026 02:58

willg-nv requested a review from ajrasane February 2, 2026 02:58

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

This was referenced Feb 2, 2026

Integrate Automated QDQ benchmark - part 3.1 #837

Merged

Integrate Automated QDQ placement tool - part 3.3 #839

Merged

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.2 branch 3 times, most recently from 1ffcf7f to bd18dfa Compare February 9, 2026 08:36

gcunhase reviewed Feb 10, 2026

View reviewed changes

tests/unit/onnx/quantization/autotune/autotune/test_autotuner.py Outdated Show resolved Hide resolved